Using the idea of coverage that was introduced with simulation studies, we can assess the coverage of bootstrapped confidence intervals.
Consider a random sample from a standard normal distribution.
x
[1] -0.169417734 -0.618778065 0.889289947 0.743545975 0.258047445 -0.007841503
[7] -1.304085671 -1.842335987 -0.277600642 -1.612780873 0.511142431 0.628909857
[13] -0.361393835 -0.214121337 0.772547387 0.433494812 0.023510859 0.168546163
[19] -0.161329600 0.264790641 0.493521998 -0.307056166 -1.365778025 0.426529696
[25] -0.562316866 1.654235108 -1.086048084 -0.227679632 -1.468276619 0.783348415
[31] 0.190550082 0.519711800 -0.136788620 0.500098666 0.102446975 -0.763293537
[37] 0.334649480 -0.853774135 -0.862946417 -1.064681908 0.583762658 -0.049363852
[43] -0.190355104 -0.862702244 -0.603235304 0.121753100 -0.045877444 -0.029367173
[49] 0.372187947 -0.941077258
Given this, we want to estimate the 25th quantile value.
quantile(x, 0.25)
25%
-0.6148924
We can build a confidence interval around this estimate using the bootstrap method.
quantile(bs_samples, c(0.025, 0.975))
2.5% 97.5%
-0.9410773 -0.2139728
bootstrap_ci(x, function(x) quantile(x, 0.25))
2.5% 97.5%
-0.9410773 -0.2182428
We also can calculate the actual value of the 25th quantile of the standard normal distribution.
(truth <- qnorm(0.25))
[1] -0.6744898
Given that we know the actual value, we can use a simulation study to assess the coverage of our bootstrap confidence interval.
rr library(tictoc) n_reps <- 1000
tictoc::tic()
bs_ci_values <- replicate(n_reps, bootstrap_ci(rnorm(n), function(x) quantile(x, 0.25)))
tictoc::toc()
83.222 sec elapsed
bs_ci_values[,1:5]
[,1] [,2] [,3] [,4] [,5]
2.5% -0.82931940 -0.9722685 -1.2333750 -0.8899601 -1.3250445
97.5% -0.05235155 -0.5857609 -0.3119804 -0.4367608 -0.2188246
Notice that the above code takes a long time to run. Why?
We can speed things up by running our code in parallel! First, we need to install a couple of useful packages, future.apply and furrr. future.apply provides parallelized versions of the R apply functions while furrr provides parallelized versions of the purrr map functions.
install.packages(c("future.apply", "furrr"))
library(future.apply)
Loading required package: future
library(furrr)
Now, we need to setup our computer to take advantage of the cores we have available.
availableCores()
system
8
future_replicate().tic()
bs_ci_values <- future_replicate(n_reps, bootstrap_ci(rnorm(n), function(x) quantile(x, 0.25)))
toc()
22.613 sec elapsed
dim(bs_ci_values)
[1] 2 1000
tic()
bs_ci_list <- future_map(1:n_reps,
function(x) bootstrap_ci(rnorm(n), function(x) quantile(x, 0.25)))
toc()
23.416 sec elapsed
length(bs_ci_list)
[1] 1000
head(bs_ci_list)
[[1]]
2.5% 97.5%
-0.8573536 -0.1247388
[[2]]
2.5% 97.5%
-1.351951 -0.439162
[[3]]
2.5% 97.5%
-0.8703554 -0.2306619
[[4]]
2.5% 97.5%
-1.01540445 -0.03089471
[[5]]
2.5% 97.5%
-0.8405373 -0.4936884
[[6]]
2.5% 97.5%
-1.1808770 -0.6686824
Now that we have these confidence interval values, we can estimate coverage by determining what proportion of intervals contain the true value.
(coverage <- mean(purrr::map_lgl(bs_ci_list, ~.[1] <= truth & .[2] >= truth)))
[1] 0.939
(coverage <- mean(purrr::map_lgl(bs_ci_list, ~prod(. - truth) < 0)))
[1] 0.939
coverage_ci <- coverage + c(-1, 1)*qnorm(0.975)*sqrt(coverage*(1 - coverage)/n_reps)
c(lower = coverage_ci[1],
coverage = coverage,
upper = coverage_ci[2])
lower coverage upper
0.9241664 0.9390000 0.9538336